Skip to content

Fix union-types when aggregating on inline conversion function#110652

Merged
craigtaverner merged 1 commit intoelastic:mainfrom
craigtaverner:union_types_aggs_inline_conversion
Jul 10, 2024
Merged

Fix union-types when aggregating on inline conversion function#110652
craigtaverner merged 1 commit intoelastic:mainfrom
craigtaverner:union_types_aggs_inline_conversion

Conversation

@craigtaverner
Copy link
Copy Markdown
Contributor

@craigtaverner craigtaverner commented Jul 9, 2024

Fix union-types when aggregating on inline conversion function
A query like:

FROM sample_data, sample_data_str
| STATS count=count(*) BY client_ip = TO_IP(client_ip)
| SORT count DESC, client_ip ASC
| KEEP count, client_ip

This fails due to unresolved aggregates from the union-type in the grouping key. The fix was to find the FieldAttribute/InvalidMappedField combination and convert it to UnresolvedAttribute, so it can be found and resolved by the existing fix from #110476

Fixes #110627

@craigtaverner craigtaverner added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Jul 9, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

A query like:

```
FROM sample_data, sample_data_str
| STATS count=count(*) BY client_ip = TO_IP(client_ip)
| SORT count DESC, client_ip ASC
| KEEP count, client_ip
```

Failed due to unresolved aggregates from the union-type in the grouping key
@craigtaverner craigtaverner force-pushed the union_types_aggs_inline_conversion branch from 57f699e to 2a015b7 Compare July 9, 2024 15:45
@craigtaverner craigtaverner changed the title Fix union types aggs grouping on inline conversion function Fix union-types when aggregating on inline conversion function Jul 9, 2024
7 | 2
;

multiIndexTsLongStatsStats
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the test that tests the fix in the PR; but is a useful test that was used to flush out some related behaviors, so I wanted to keep it. For the test that really matters, look up at line 284, which uses the capability union_types_inline_fix, and was already merged as a result of the previous fix, but had the wrong capability name, and so was ignored and we missed the failure. This PR adds that capability name, enabling this test, and fixes the test.

/**
* Fix for union-types when aggregating over an inline conversion with conversion function. Done in #110652.
*/
UNION_TYPES_INLINE_FIX;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test on line 284 of union_types.csv-spec was already requiring this capability (and therefor ignored). We now enabled this capability and fix the failing test that resulted.

@craigtaverner craigtaverner requested a review from alex-spies July 9, 2024 15:54
Copy link
Copy Markdown
Member

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@craigtaverner craigtaverner merged commit 4ddca13 into elastic:main Jul 10, 2024
craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Jul 15, 2024
…ic#110652)

A query like:

```
FROM sample_data, sample_data_str
| STATS count=count(*) BY client_ip = TO_IP(client_ip)
| SORT count DESC, client_ip ASC
| KEEP count, client_ip
```

Failed due to unresolved aggregates from the union-type in the grouping key
craigtaverner added a commit that referenced this pull request Jul 16, 2024
* Fix bug in union-types with type-casting in grouping key of STATS (#110476)

* Allow auto-generated type-cast fields in CsvTests

This allows, for example, a csv-spec test result header like `client_ip::ip:ip`, which is generated with a command like `STATS count=count(*) BY client_ip::ip`

It is also a small cleanup of the header parsing code, since it was using Strings.split() in an odd way.

* Fix bug in union-types with type-casting in grouping key of STATS

* Update docs/changelog/110476.yaml

* Added casting_operator required capability

Using the new `::` syntax requires disabling support for older versions in multi-cluster tests.

* Added more tests for inline stats over long/datetime

* Trying to fix the STATS...STATS bug

This makes two changes:

* Keeps the Alias in the aggs.aggregates from the grouping key, so that ReplaceStatsNestedExpressionWithEval still works
* Adds explicit support for union-types conversion at grouping key loading in the ordinalGroupingOperatorFactory

Neither fix the particular edge case, but do seem correct

* Added EsqlCapability for this change

So that mixed cluster tests don't fail these new queries.

* Fix InsertFieldExtract for union types

Union types require a FieldExtractExec to be performed first thing at
the bottom of local physical plans.

In queries like
```
  from testidx*
  | eval x = to_string(client_ip)
  | stats c = count(*) by x
  | keep c
```
The `stats` has the grouping `x` but the aggregates get pruned to just
`c`. In cases like this, we did not insert a FieldExtractExec, which
this fixes.

* Revert query that previously failed

With Alex's fix, this query now passes.

* Revert integration of union-types to ordinals aggregator

This is because we have not found a test case that actually demonstrates this is necessary.

* More tests that would fail without the latest fix

* Correct code style

* Fix failing case when aggregating on union-type with invalid grouping key

* Capabilities restrictions on the new YML tests

* Update docs/changelog/110476.yaml

---------

Co-authored-by: Alexander Spies <alexander.spies@elastic.co>

* An alternative approach to supporting union-types on stats grouping field (#110600)

* Added union-types field extration to ordinals aggregation

* Revert previous approach to getting union-types working in aggregations

Where the grouping field is erased by later commands, like a subsequent stats.
Instead we include union-type supports in the ordinals aggregation and mark the block loader as not supporting ordinals.

* Fix union-types when aggregating on inline conversion function (#110652)

A query like:

```
FROM sample_data, sample_data_str
| STATS count=count(*) BY client_ip = TO_IP(client_ip)
| SORT count DESC, client_ip ASC
| KEEP count, client_ip
```

Failed due to unresolved aggregates from the union-type in the grouping key

* Fix for union-types for multiple columns with the same name (#110793)

* Make union types use unique attribute names

* Cleanup leftover

* Added failing test and final fix to EsRelation

* Implement FieldAttribute.fieldName()

* Fix tests

* Refactor

* Do not ignore union typed field's parent

* Fix important typo

D'oh

* Mute unrelated (part of) test

* Move capability to better location

* Fix analyzer tests

* multi-node tests with an earlier version of union-types (before this change) fail

* Add capability to remaining failing tests

* Remove variable

* Add more complex test

* Consolidate union type cleanup rules

* Add 3 more required_capability's to make CI happy

* Update caps for union type subfield yaml tests

* Update docs/changelog/110793.yaml

* Refined changelog text

* Mute BWC for 8.15.0 for failing YAML tests

* union_types_remove_fields for all 160_union_types tests

The tests fail spordically, so safer to mute the entire suite.

---------

Co-authored-by: Alexander Spies <alexander.spies@elastic.co>

---------

Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
tvernum pushed a commit that referenced this pull request Feb 25, 2025
A query like:

```
FROM sample_data, sample_data_str
| STATS count=count(*) BY client_ip = TO_IP(client_ip)
| SORT count DESC, client_ip ASC
| KEEP count, client_ip
```

Failed due to unresolved aggregates from the union-type in the grouping key
tvernum pushed a commit that referenced this pull request Feb 25, 2025
A query like:

```
FROM sample_data, sample_data_str
| STATS count=count(*) BY client_ip = TO_IP(client_ip)
| SORT count DESC, client_ip ASC
| KEEP count, client_ip
```

Failed due to unresolved aggregates from the union-type in the grouping key
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Another failing edge case for union-types and stats grouping on inline type conversions

3 participants